Large Scale Language Independent GenerationUsing Thematic
نویسنده
چکیده
This paper describes a large-scale language-independent evaluation of the use of Thematic Hierarchies in natural language generation. We translate from a corpus of sentences reeecting the full variety of behavior of Levin-based verb classes. The corpus is used as input to a generation system that utilizes the same thematic hierarchy for realizing relative argument surface positions in two languages: English and Spanish. The output was manually evaluated by English and Spanish speakers. The contributions of this work include: (1) an improved thematic hierarchy over an earlier implementation; (2) a large-scale evaluation of the use of thematic hierarchies in two languages; (3) an implementation of a language independent module for natural language generation; and (4) the creation of a single tool for incremental development of multilingual lexicons. 1 Motivation In (Dorr et al., 1998), an implementationof thematic hierarchies for eecient natural language generation was presented. The use of the thematic hierarchy was evaluated using a small hand-constructed corpus of 100 English sentences reeecting a variety of English verb classes and alternations. The hierarchy was implemented using cascading rules within the grammar formalism provided as part of the natural language realization engine Nitrogen (Langkilde and Knight, 1998a; Langkilde and Knight, 1998b). Some of the shortcomings of this earlier work are: (1) inadequate evaluation due to the use of a small test corpus; (2) limitation of the approach to one language only (English); (3) lack of a principled design in the implementation. This paper presents more systematic implementation of thematic hierarchies and a large-scale evaluation of their use for generation in English and Spanish. This evaluation was helpful in incremen-tal development of both the thematic hierarchy and the English and Spanish lexicons. The work presented here is part of the generation component (Traum and Habash, 2000) of the inter-lingual Machine Translation eeort at the University of Maryland College Park. The generation component has also been used in Cross-Language Information Retrieval research (Levow et al., 2000). The interlingual representation used is Lexical Conceptual Structure (LCS),a compositional abstraction with language-independent properties that tran-One of the major challenges in natural language processing is the ability to make use of existing resources. Large diierences in syntax, semantics, and ontologies of such resources create signiicant barriers to their usage in large-scale applications. A case in point is the wide range of \interlingual rep-resentations" used in machine translation and cross-language processing. Such representations are becoming increasingly prevalent, yet views …
منابع مشابه
Composition Decomposition LinearizationParsing Extraction Statistical Language Lexicon Source Language Lexicon Target Analysis Generation Realization Lexical Selection LCS Parse Word Lattice AMR Target
This paper describes a large-scale language-independent evaluation of the use of Thematic Hierarchies in natural language generation. We translate from a corpus of sentences reeecting the full variety of behavior of Levin-based verb classes. The corpus is used as input to a generation system that utilizes the same thematic hierarchy for realizing relative argument surface positions in two langu...
متن کاملGeneration Realization Lexical Selection LCS Parse Word Lattice AMR Target
This paper describes a large-scale language-independent evaluation of the use of Thematic Hierarchies in natural language generation. We translate from a corpus of sentences reeecting the full variety of behavior of Levin-based verb classes. The corpus is used as input to a generation system that utilizes the same thematic hierarchy for realizing relative argument surface positions in two langu...
متن کاملMAN-MACHINE INTERACTION SYSTEM FOR SUBJECT INDEPENDENT SIGN LANGUAGE RECOGNITION USING FUZZY HIDDEN MARKOV MODEL
Sign language recognition has spawned more and more interest in human–computer interaction society. The major challenge that SLR recognition faces now is developing methods that will scale well with increasing vocabulary size with a limited set of training data for the signer independent application. The automatic SLR based on hidden Markov models (HMMs) is very sensitive to gesture's shape inf...
متن کاملLatent Linguistic Codes for Morphemes Using Independent Component Analysis
We study properties of morphemes by analyzing their use in a large Finnish text corpus using Independent Component Analysis (ICA). As a result, we obtain emergent linguistic representations for the morphemes. On a coarse level, main syntactic categories are observed. On a more detailed level, the components depict potential thematic roles of the morphemes. An interesting question is whether the...
متن کاملConstruction of Chinese-english Semantic Hierarchy for Information Retrieval
This paper describes an approach to large-scale construction of a semantic hierarchy for Chinese verbs. Leveraging oo of an existing Chinese conceptual database called HowNet and a Levin-based English verb classiication, we use thematic-role information to create links between Chinese concepts and English classes. The resulting hierarchy is used for multilingual lexicons in an English-Chinese c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001